Skip to content

Save all volatile argument registers in GC probe hijack frames on AMD64 and ARM64#126848

Open
Copilot wants to merge 10 commits intomainfrom
copilot/fix-fast-tail-call-candidates-test
Open

Save all volatile argument registers in GC probe hijack frames on AMD64 and ARM64#126848
Copilot wants to merge 10 commits intomainfrom
copilot/fix-fast-tail-call-candidates-test

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 13, 2026

Description

CrossGen2 crashes with AV in SVR::GCHeap::Promote when the GC hijacks a thread at a point where volatile argument registers contain live GC references. The hijack stub's PInvokeTransitionFrame didn't save these registers, so StackFrameIterator found NULL save locations when GC info reported them as live.

Windows AMD64: FixupHijackedCallstack used RDX for the thread pointer and R8 as scratch, destroying their original values. Only RAX/RCX were saved. Only xmm0 was preserved for floating point.

Unix AMD64: RSI, RDI, R8, and R9 were destroyed by the INLINE_GETTHREAD function call and never saved. On the System V ABI, RSI and RDI are volatile argument-passing registers (1st and 2nd integer args) that could contain live GC references at hijack points. On Windows, RSI/RDI are already covered as callee-saved registers in PTFF_SAVE_ALL_PRESERVED. Only xmm0/xmm1 were preserved for floating point.

ARM64 (Windows and Unix): FixupHijackedCallstack used x3 as trash and x4 for the thread pointer, destroying their original values. Only x0-x2 were saved in the probe frame. Volatile argument registers x3-x7 could contain live GC references at hijack points but were not preserved. Only q0-q3 were preserved for floating point.

JIT interface resolution: RBM_INTERFACELOOKUP_FOR_SLOT_TRASH did not correctly indicate that all non-argument volatile registers (including floating point and mask registers) are trashed, which could cause the JIT to assume non-argument volatile registers survive the resolution helper.

AMD64 Changes

  • FixupHijackedCallstack (Windows): Use R10/R11 instead of RDX/R8 for thread pointer and scratch, preserving all volatile argument registers
  • FixupHijackedCallstack (Unix): Save/restore RSI, RDI, R8, R9, and xmm0-xmm7 around INLINE_GETTHREAD using CFI-aware push_register/pop_register/alloc_stack/free_stack macros; use R10 instead of R8 for scratch
  • PUSH_PROBE_FRAME (Windows): Save/restore RDX, R8, R9 in correct m_PreservedRegs order matching StackFrameIterator flag processing; save xmm0-xmm3 (was xmm0 only)
  • PUSH_PROBE_FRAME (Unix): Save/restore RSI, RDI, RDX, R8, R9 in correct m_PreservedRegs order matching StackFrameIterator flag processing (RSI/RDI placed between R12 and RBX) using push_register macros; save xmm0-xmm7 (was xmm0-xmm1 only)
  • POP_PROBE_FRAME (Windows): Restore RDX, R8, R9 and xmm0-xmm3
  • POP_PROBE_FRAME (Unix): Restore RSI, RDI, RDX, R8, R9 and xmm0-xmm7 using pop_register/pop_nonvol_reg/free_stack macros for proper CFI unwind directives
  • Hijack flags (Windows): Include PTFF_SAVE_RDX + PTFF_SAVE_R8 + PTFF_SAVE_R9
  • Hijack flags (Unix): Include PTFF_SAVE_RSI + PTFF_SAVE_RDI + PTFF_SAVE_RDX + PTFF_SAVE_R8 + PTFF_SAVE_R9
  • Flag definitions: Add PTFF_SAVE_RDX, PTFF_SAVE_R8, PTFF_SAVE_R9 to AsmMacros.inc; add PTFF_SAVE_RSI, PTFF_SAVE_RDI, PTFF_SAVE_R8, PTFF_SAVE_R9 to unixasmmacrosamd64.inc

ARM64 Changes

  • FixupHijackedCallstack (Windows): Use x9/x10 instead of x4/x3 for thread pointer and scratch, preserving all volatile argument registers x0-x7
  • FixupHijackedCallstack (Unix): Use INLINE_GETTHREAD x9 (non-ETLS) or save/restore x0-x7 and q0-q7 around RhpGetThread call (ETLS path), preserving all argument registers
  • PUSH_PROBE_FRAME: Save x0-x7 (was x0-x2) and q0-q7 (was q0-q3) in correct m_PreservedRegs order matching StackFrameIterator flag processing
  • POP_PROBE_FRAME: Restore x0-x7 (was x0-x2) and q0-q7 (was q0-q3)
  • Frame layout: Increased to accommodate x3-x7 and q4-q7 storage with proper 16-byte alignment for q registers
  • Hijack flags: Include PTFF_SAVE_X3 + PTFF_SAVE_X4 + PTFF_SAVE_X5 + PTFF_SAVE_X6 + PTFF_SAVE_X7
  • Flag definitions: Add PTFF_SAVE_X3 through PTFF_SAVE_X7 to unixasmmacrosarm64.inc (Windows AsmMacros.h already had them)
  • GC stress paths: Updated to use x9/x10 and include x3-x7 flags

JIT Changes

  • RBM_INTERFACELOOKUP_FOR_SLOT_TRASH (AMD64): Updated to (RBM_CALLEE_TRASH & ~(RBM_ARG_REGS | RBM_FLTARG_REGS)) — trashes all non-argument volatile registers (integer, floating point, and mask), ensuring the JIT does not assume any non-argument volatile registers survive the interface resolution helper
  • RBM_INTERFACELOOKUP_FOR_SLOT_TRASH (ARM64): Updated to (RBM_CALLEE_TRASH & ~(RBM_ARG_REGS | RBM_FLTARG_REGS)) — trashes all non-argument volatile registers (integer x8-x15/IP0/IP1/LR, non-argument floating point, and mask registers)

… frames

On AMD64, when the GC hijacks a thread via RhpGcProbeHijack, the hijack stub
creates a PInvokeTransitionFrame. Previously on Windows, only RAX and RCX were
saved, and on Unix only RAX, RCX, and RDX were saved. If the GC info for the
managed frame at the hijack point reports other volatile registers (RDX, R8, R9)
as live GC references, the StackFrameIterator would find NULL save locations
and crash (AV in SVR::GCHeap::Promote).

Windows changes:
- FixupHijackedCallstack now uses R10/R11 for thread pointer instead of RDX/R8,
  preserving all volatile argument registers (RAX, RCX, RDX, R8, R9)
- PUSH_PROBE_FRAME/POP_PROBE_FRAME extended to save/restore RDX, R8, R9
- RhpGcProbeHijack flags include PTFF_SAVE_RDX + PTFF_SAVE_R8 + PTFF_SAVE_R9
- AsmMacros.inc gains PTFF_SAVE_RDX, PTFF_SAVE_R8, PTFF_SAVE_R9 definitions

Unix changes:
- FixupHijackedCallstack now also saves/restores R8 and R9 across GETTHREAD
- Uses R10 instead of R8 for scratch in hijack fixup and bitmask passing
- PUSH_PROBE_FRAME/POP_PROBE_FRAME extended to save/restore R8, R9
- RhpGcProbeHijack flags include PTFF_SAVE_R8 + PTFF_SAVE_R9
- unixasmmacrosamd64.inc gains PTFF_SAVE_R8, PTFF_SAVE_R9 definitions

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/411be3e7-43a1-4bc3-b896-e27d92fe37c3

Co-authored-by: mangod9 <61718172+mangod9@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 13, 2026 19:18
Copilot AI requested review from Copilot and removed request for Copilot April 13, 2026 19:25
Copilot AI changed the title [WIP] Fix test failure for FastTailCallCandidates Save all volatile argument registers (RDX, R8, R9) in GC probe hijack frames on AMD64 Apr 13, 2026
Copilot AI requested a review from mangod9 April 13, 2026 19:26
On Unix System V ABI, RSI and RDI are volatile argument-passing registers
(1st and 2nd integer args) that could contain live GC references at hijack
points. On Windows, they are already saved as callee-saved registers in
PTFF_SAVE_ALL_PRESERVED.

Unix changes:
- FixupHijackedCallstack now saves/restores RSI and RDI across GETTHREAD
- PUSH_PROBE_FRAME saves RSI and RDI between R12 and RBX, matching the
  StackFrameIterator's flag processing order (RBX, RSI, RDI, R12...)
- POP_PROBE_FRAME restores RSI and RDI in correct order
- RhpGcProbeHijack flags include PTFF_SAVE_RSI + PTFF_SAVE_RDI
- unixasmmacrosamd64.inc gains PTFF_SAVE_RSI and PTFF_SAVE_RDI definitions

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/85ca9ec1-057b-4a3c-8182-e6da274d44b6

Co-authored-by: mangod9 <61718172+mangod9@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 13, 2026 21:27
Copilot AI changed the title Save all volatile argument registers (RDX, R8, R9) in GC probe hijack frames on AMD64 Save all volatile argument registers in GC probe hijack frames on AMD64 Apr 13, 2026
@jkotas jkotas marked this pull request as ready for review April 13, 2026 21:43
Copilot AI review requested due to automatic review settings April 13, 2026 21:43
@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 14, 2026

Is saving argument registers enough?

IIRC, we have only discussed preserving argument registers in #122639. None of the asm helpers are not setup to save all argument registers.

Is it a problem to restrict the JIT to only assume argument registers are preserved around the resolve helper?

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 14, 2026

Also, I assume that the JIT expects floating point argument registers to be preserved. Is that right?

The helper changed in this PR is only preserving xmm0. It should be fixed to preserve all arg registers.

@jakobbotsch
Copy link
Copy Markdown
Member

IIRC, we have only discussed preserving argument registers in #122639. None of the asm helpers are not setup to save all argument registers.

Is it a problem to restrict the JIT to only assume argument registers are preserved around the resolve helper?

No, we can just update RBM_INTERFACELOOKUP_FOR_SLOT_TRASH to indicate that all non-argument registers are trashed. I think you are right I have talked about argument registers when we were discussing this.

@copilot Please update this PR to change RBM_INTERFACELOOKUP_FOR_SLOT_TRASH for all targets such that non-argument volatile registers are always trashed.

Also, I assume that the JIT expects floating point argument registers to be preserved. Is that right?

Yes, that is right.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 14, 2026

Also, I assume that the JIT expects floating point argument registers to be preserved. Is that right?
Yes, that is right.

@copilot Please update this PR to preserve floating point argument registers in these asm helpers.

…gisters in probe frames

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/2bbc56c9-47e8-42f8-903e-51a79bbb96c6

Co-authored-by: jakobbotsch <7887810+jakobbotsch@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 14, 2026 15:14
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 14, 2026

IIRC, we have only discussed preserving argument registers in #122639. None of the asm helpers are not setup to save all argument registers.

Is it a problem to restrict the JIT to only assume...

Done in b1486c3.

RBM_INTERFACELOOKUP_FOR_SLOT_TRASH:

  • AMD64: Changed to (RBM_INT_CALLEE_TRASH_INIT & ~RBM_ARG_REGS) — evaluates to the same RBM_RAX | RBM_R10 | RBM_R11 on both ABIs but is now self-documenting
  • ARM64: Changed to (RBM_INT_CALLEE_TRASH & ~RBM_ARG_REGS) — expands to include R8-R11, IP0, IP1, LR in addition to the previous R12-R15

FP argument registers: Updated all GcProbe files to save/restore all FP arg registers:

  • Windows AMD64: xmm0-xmm3 (was xmm0)
  • Unix AMD64: xmm0-xmm7 (was xmm0-xmm1), also saved/restored across INLINE_GETTHREAD in FixupHijackedCallstack since it makes blr/call that trash all volatile FP regs
  • ARM64 (both): q0-q7 (was q0-q3), frame size increased from 0x100 to 0x140
  • ARM64 Unix: q0-q7 saved/restored across INLINE_GETTHREAD/RhpGetThread in FixupHijackedCallstack

Also, I assume that the JIT expects floating point argument registers to be preserved. Is that right?

The helper changed in this PR is only preserving xmm0. It should be fixed to preserve all arg ...

Fixed in b1486c3. All FP arg registers are now preserved: xmm0-3 (Windows AMD64), xmm0-7 (Unix AMD64), q0-q7 (ARM64). On Unix, they're also saved/restored across INLINE_GETTHREAD in FixupHijackedCallstack since the TLS accessor makes function calls that trash volatile FP regs.

Copilot AI requested review from Copilot and removed request for Copilot April 14, 2026 15:18
Copilot AI requested a review from jakobbotsch April 14, 2026 15:20
…egisters including float

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/24a8bc8a-0f2f-49c9-b96a-268d92070834

Co-authored-by: jakobbotsch <7887810+jakobbotsch@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 14, 2026 15:43
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 14, 2026

Also, I assume that the JIT expects floating point argument registers to be preserved. Is that right?
Yes, that is right.

...

Already done in b1486c3. All FP arg registers are preserved: xmm0-3 (Windows AMD64), xmm0-7 (Unix AMD64), q0-q7 (ARM64). On Unix, they're also saved/restored across INLINE_GETTHREAD in FixupHijackedCallstack.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 14, 2026

/azp run runtime-nativeaot-outerloop

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Test failure: JIT/opt/FastTailCall/FastTailCallCandidates/FastTailCallCandidates.cmd

6 participants